Pivot Tables¶

The Pandas pivot_table() is used to calculate, aggregate, and summarize your data. It is defined as a powerful tool that aggregates data with calculations such as Sum, Count, Average, Max, and Min.

It also allows the user to sort and filter your data when the pivot table has been created. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

  • pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)
In [13]:
import pandas as pd
import numpy as np
In [14]:
df = pd.DataFrame({"A": ["burger", "burger", "burger", "burger", "burger",
                         "fries", "fries", "fries", "fries"],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two"],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large"],
                   "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                   "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
In [15]:
df
Out[15]:
A B C D E
0 burger one small 1 2
1 burger one large 2 4
2 burger one large 2 5
3 burger two small 3 5
4 burger two small 3 6
5 fries one large 4 6
6 fries one small 5 8
7 fries two small 6 9
8 fries two large 7 9
In [17]:
table = pd.pivot_table(df, index=['A', 'B'])
table
Out[17]:
D E
A B
burger one 1.666667 3.666667
two 3.000000 5.500000
fries one 4.500000 7.000000
two 6.500000 9.000000
In [18]:
table = pd.pivot_table(df, index=['A', 'B'], aggfunc=np.sum)
table
Out[18]:
D E
A B
burger one 5 11
two 6 11
fries one 9 14
two 13 18

This first example aggregates values by taking the sum.¶

In [19]:
table = pd.pivot_table(df, values='D', index=['A', 'B'],
                    columns=['C'], aggfunc=np.sum)
table
Out[19]:
C large small
A B
burger one 4.0 1.0
two NaN 6.0
fries one 4.0 5.0
two 7.0 6.0

We can also fill missing values using the fill_value parameter.¶

In [20]:
table = pd.pivot_table(df, values='D', index=['A', 'B'],
                    columns=['C'], aggfunc=np.sum, fill_value=0)
table
Out[20]:
C large small
A B
burger one 4 1
two 0 6
fries one 4 5
two 7 6

The next example aggregates by taking the mean across multiple columns.¶

In [24]:
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
                    aggfunc={'D': np.mean,
                             'E': np.mean})
table
Out[24]:
D E
A C
burger large 2.000000 4.500000
small 2.333333 4.333333
fries large 5.500000 7.500000
small 5.500000 8.500000

We can also calculate multiple types of aggregations for any given value column.¶

In [23]:
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
                    aggfunc={'D': np.mean,
                             'E': [min, max, np.mean]})
table
Out[23]:
D E
mean max mean min
A C
burger large 2.000000 5 4.500000 4
small 2.333333 6 4.333333 2
fries large 5.500000 9 7.500000 6
small 5.500000 9 8.500000 8
In [ ]: